Reranking-based Crash Report Deduplication

نویسندگان

  • Akira Moroo
  • Akiko Aizawa
  • Takayuki Hamamoto
چکیده

Software projects collect and deduplicate vastly numerous crash reports from users to fix bugs efficiently. However, most existing automated methods have performance issues during large-scale clustering. We propose a rerankingbased crash report clustering method. Our method is a combination of two earlier methods. By computing similarity used in ReBucket for the crash reports that are highly similar to the query crash report, the method can process reports with throughput equal to that of PartyCrasher. We also introduce an automatically generated dataset for crash report clustering tasks. The evaluation revealed that our method performs at high processing speed while maintaining high accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Unreasonable Effectiveness of 1 Traditional Information Retrieval in Crash

6 Organizations like Mozilla, Microsoft, and Apple are flooded with thousands of automated crash reports per day. Although crash reports contain valuable information for debugging, there are often too many for developers to examine individually. Therefore, in industry, crash reports are often automatically grouped together in buckets. Ubuntu’s repository contains crashes from hundreds of softwa...

متن کامل

Discriminative Reranking for Semantic Parsing

Semantic parsing is the task of mapping natural language sentences to complete formal meaning representations. The performance of semantic parsing can be potentially improved by using discriminative reranking, which explores arbitrary global features. In this paper, we investigate discriminative reranking upon a baseline semantic parser, SCISSOR, where the composition of meaning representations...

متن کامل

A Survey On Visual Search Reranking

Due to the explosive growth of online video data and images , visual search is becoming an important area of research. Most existing approaches used text based image retrieval which is not so efficient. To precisely specify the visual documents, Visual search reranking is used. Visual search reranking is the rearrangement of visual documents based on initial search results or some external know...

متن کامل

HTTP-Level Deduplication with HTML5

In this project, we examine HTTP-level duplication. We first report on our initial measurement study, analyzing the amount and types of duplication in the Internet today. We then discuss several opportunities for deduplication: in particular, we implement two versions of a simple server-client architecture that takes advantage of HTML5 client-side storage for value-based caching and deduplication.

متن کامل

Consensus in Asynchronous Systems Where Processes Can Crash and Recover

The Consensus problem is now well identified as being one of the most important problems encountered in the design and the construction of fault-tolerant distributed systems. This problem is defined as follows: processes have to reach a common decision, which depends on their inputs, despite failures. We consider the Consensus problem in asynchronous distributed systems augmented with unreliabl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017